Clone the git repository.
$ git clone https://github.com/ontop/vig.git
In order to build the project, simply run the following bash script.
$ ./build.sh
from the vig folder.
If the build script ran successfully, you should be able to observe an output as follows:
[SCRIPT] This is your config information:
[SCRIPT] resources folder: ${HOME}/resources
[SCRIPT] configuration file: ${HOME}/resources/configuration.conf
[SCRIPT] csvs folder: ${HOME}/resources/csvs
The configuration.conf
file contains the configuration of the generator. This file looks as follows:
# ====================
# Mandatory parameters
# ====================
jdbc-connector jdbc:mysql # The only connector supported, at the moment.
database-url <addr:port/dbName> # The address, port, and name of the source database.
database-user <user> # The username for the access to the source database.
database-pwd <pwd> # The password for the access to the source database.
# VIG generation mode. Either DB or OBDA (default). Since VIG 1.1, OBDA mode is preferred.
# The NPD Benchmark, v1.8.0 onwards, should be run in OBDA mode.
# OBDA mode reads also statistics from the mappings, and supports fixed-domain columns.
mode <DB|ODBA>
random-gen [true|false] # If true, then the generator will behave as a pure
# random generator. DB-mode only.
obda-file <path/mappings.obda> # The location of the mapping file in .obda format.
# IMPORTANT: Connection parameters should be set
# also in this file. OBDA-mode only.
scale <value> # Scaling factor value. Default: 1.0
# ======================================================================================
# Advanced parameters. Commented out, as usually the defaults values are enough. You can
# check what the default values are set to by running VIG with the --help option.
# ======================================================================================
# fixed <table1.col1> <table2.col2> ... # Manually specified fixed-domain columns. OBDA-mode only.
# non-fixed <table1.col1> <table2.col2> ...# Manually specified non fixed-domain columns. OBDA-mode only.
# Time (ms) allowed for the columns-cluster analysis. Given a columns cluster {A,B,C} of columns A, B,
# and C,VIG tries to compute the cardinality of all possible intersections between these three columns,
# namely AB, AC, BC and ABC. If timeout is reached, say, while evaluating cardinality of the intersection
# ABC, then such cardinality is assumed to be zero (hence, no value will be generated in the intersection
# of the columns A,B, and C). OBDA-mode only.
# cc-timeout <value>
# tables <table1> <table2> ... # Generate only the specified tables.
# columns <table1.col1> <table2.col2> ... # Generate only the specified columns.
For a complete list of the vig command-line options, please refer to the help utility by running
$ java -jar vig.jar --help
or
$ java -jar vig.jar --help-verbose
Command-line options override the parameters provided through the configuration file.
After having set up the configuration file, simply execute the jar
$ java -jar vig.jar
1) Launch the build script from the vig directory:
./build.sh
2) Create the desired source database, e.g.
$ mysql --user="username" --host="address" --password="password" npdSource < npd-data-dump.sql
3) Set up the configuration file in resources/configuration.conf
.
jdbc-connector jdbc:mysql
database-url db_host/db_name
database-user user
database-pwd pwd
mode OBDA
obda-file resources/npd-v2-ql_a.obda
scale 2
In our configuration, we have set up the scaling factor to 2 through the parameter scale 2
.
5) If we have set the mode to OBDA, as in our example, then we need to put mappings in the location we specified in the configuration file (resources/npd-v2-ql_a.obda
in our example). Morover, we need to set up the connection parameters in the mappings file:
[SourceDeclaration]
sourceUri http://sws.ifi.uio.no/vocab/npd-v2
connectionUrl jdbc:mysql://db_host/db_name
username user
password pwd
driverClass com.mysql.jdbc.Driver
4) We are now ready to run VIG, specifying the location of the resources folder.
$ java -jar vig.jar --res="resources"
5) The csv files will be generated in the directory resources/csvs
.
6) Import the csv files into the RDBMS.